NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Causal Dataset Discovery with Large Language Models

https://doi.org/10.1145/3665939.3665968

Liu, Junfei; Sun, Shaotong; Nargesian, Fatemeh (June 2024, ACM)

Full Text Available
Data distribution tailoring revisited: cost-efficient integration of representative data

https://doi.org/10.1007/s00778-024-00849-w

Chang, Jiwon; Cui, Bohan; Nargesian, Fatemeh; Asudeh, Abolfazl; Jagadish, H V (September 2024, The VLDB Journal)

Full Text Available
FairEM360: A Suite for Responsible Entity Matching

Shahbazi, Nima; Erfanian, Mahdi; Asudeh, Abolfazl; Nargesian, Fatemeh; Srivastava, Divesh (August 2024, Proceedings of the VLDB Endowment)

Full Text Available
PLUTUS: Understanding Data Distribution Tailoring for Machine Learning

https://doi.org/10.1145/3626246.3654745

Chang, Jiwon; Dionysio, Christina; Nargesian, Fatemeh; Boehm, Matthias (June 2024, ACM)

Full Text Available
Efficient Discovery of Temporal Inclusion Dependencies in Wikipedia Tables

Bornemann, Leon; Bleifuß, Tobias; Kalashnikov, Dmitri V; Nargesian, Fatemeh; Naumann, Felix; Srivastava, Divesh (March 2024, Advances in database technology)

Full Text Available
Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching

https://doi.org/10.14778/3611479.3611525

Shahbazi, Nima; Danevski, Nikola; Nargesian, Fatemeh; Asudeh, Abolfazl; Srivastava, Divesh (July 2023, Proceedings of the VLDB Endowment)

Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on these two topics, little attention has been paid to the fairness of entity matching. Towards addressing this gap, we perform an extensive experimental evaluation of a variety of EM techniques in this paper. We generated two social datasets from publicly available datasets for the purpose of auditing EM through the lens of fairness. Our findings underscore potential unfairness under two common conditions in real-world societies: (i) when some demographic groups are over-represented, and (ii) when names are more similar in some groups compared to others. Among our many findings, it is noteworthy to mention that while various fairness definitions are valuable for different settings, due to EM's class imbalance nature, measures such as positive predictive value parity and true positive rate parity are, in general, more capable of revealing EM unfairness.
more » « less
Full Text Available
Approximate Query Answering over Open Data

https://doi.org/10.1145/3597465.3605227

Zhang, Mengqi; Mundra, Pranay; Chikweze, Chukwubuikem; Nargesian, Fatemeh; Weikum, Gerhard (June 2023, HILDA'23: The SIGMOD 2023 Workshop on Human-in-the-Loop Data Analytics)

Open knowledge, including open data and publicly available knowledge bases, offers a rich opportunity for data scientists for analysis and query answering, but comes with big obstacles due to the diverse, noisy, and incomplete nature of its data eco-system. This paper proposes a vision for enabling approximate QUery answering over Open Knowledge (Quok), with a focus on supporting analytic tasks that involve identifying relevant data and computing aggregations. We define the problem, outline a system architecture, and discuss challenges and approaches to taming the uncertainty and incompleteness of open knowledge.
more » « less
Full Text Available
Koios: Top-k Semantic Overlap Set Search

https://doi.org/10.1109/ICDE55515.2023.00121

Mundra, Pranay; Zhang, Jianhao; Nargesian, Fatemeh; Augsten, Nikolaus (April 2023, 2023 IEEE 39th International Conference on Data Engineering (ICDE))

We study the top-k set similarity search problem using semantic overlap. While vanilla overlap requires exact matches between set elements, semantic overlap allows elements that are syntactically different but semantically related to increase the overlap. The semantic overlap is the maximum matching score of a bipartite graph, where an edge weight between two set elements is defined by a user-defined similarity function, e.g., cosine similarity between embeddings. Common techniques like token indexes fail for semantic search since similar elements may be unrelated at the character level. Further, verifying candidates is expensive (cubic versus linear for syntactic overlap), calling for highly selective filters. We propose Koios, the first exact and efficient algorithm for semantic overlap search. Koios leverages sophisticated filters to minimize the number of required graph-matching calculations. Our experiments show that for medium to large sets less than 5% of the candidate sets need verification, and more than half of those sets are further pruned without requiring the expensive graph matching. We show the efficiency of our algorithm on four real datasets and demonstrate the improved result quality of semantic over vanilla set similarity search.
more » « less
Full Text Available
Next-generation Challenges of Responsible Data Integration

https://doi.org/10.1145/3539597.3572727

Nargesian, Fatemeh; Asudeh, Abolfazl; Jagadish, H. V. (February 2023, WSDM '23: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining)

Full Text Available
Matching Roles from Temporal Data: Why Joe Biden is not only President, but also Commander-in-Chief

https://doi.org/10.1145/3588919

Bornemann, Leon; Bleifuß, Tobias; Kalashnikov, Dmitri V.; Nargesian, Fatemeh; Naumann, Felix; Srivastava, Divesh (May 2023, Proceedings of the ACM on Management of Data)

We present role matching, a novel, fine-grained integrity constraint on temporal fact data, i.e., (subject, predicate, object, timestamp)-quadruples. A role is a combination of subject and predicate and can be associated with different objects as the real world evolves and the data changes over time. A role matching states that the associated object of two or more roles should always match across time. Once discovered, role matchings can serve as integrity constraints to improve data quality, for instance of structured data in Wikipedia[3]. If violated, role matchings can alert data owners or editors and thus allow them to correct the error. Finding all role matchings is challenging due both to the inherent quadratic complexity of the matching problem and the need to identify true matches based on the possibly short history of the facts observed so far. To address the first challenge, we introduce several blocking methods both for clean and dirty input data. For the second challenge, the matching stage, we show how the entity resolution method Ditto[27] can be adapted to achieve satisfactory performance for the role matching task. We evaluate our method on datasets from Wikipedia infoboxes, showing that our blocking approaches can achieve 95% recall, while maintaining a reduction ratio of more than 99.99%, even in the presence of dirty data. In the matching stage, we achieve a macro F1-score of 89% on our datasets, using automatically generated labels.
more » « less
Full Text Available

« Prev Next »

Search for: All records